86 research outputs found
SMUG Planner: A Safe Multi-Goal Planner for Mobile Robots in Challenging Environments
Robotic exploration or monitoring missions require mobile robots to
autonomously and safely navigate between multiple target locations in
potentially challenging environments. Currently, this type of multi-goal
mission often relies on humans designing a set of actions for the robot to
follow in the form of a path or waypoints. In this work, we consider the
multi-goal problem of visiting a set of pre-defined targets, each of which
could be visited from multiple potential locations. To increase autonomy in
these missions, we propose a safe multi-goal (SMUG) planner that generates an
optimal motion path to visit those targets. To increase safety and efficiency,
we propose a hierarchical state validity checking scheme, which leverages
robot-specific traversability learned in simulation. We use LazyPRM* with an
informed sampler to accelerate collision-free path generation. Our iterative
dynamic programming algorithm enables the planner to generate a path visiting
more than ten targets within seconds. Moreover, the proposed hierarchical state
validity checking scheme reduces the planning time by 30% compared to pure
volumetric collision checking and increases safety by avoiding high-risk
regions. We deploy the SMUG planner on the quadruped robot ANYmal and show its
capability to guide the robot in multi-goal missions fully autonomously on
rough terrain
Few-Shot Audio-Visual Learning of Environment Acoustics
Room impulse response (RIR) functions capture how the surrounding physical
environment transforms the sounds heard by a listener, with implications for
various applications in AR, VR, and robotics. Whereas traditional methods to
estimate RIRs assume dense geometry and/or sound measurements throughout the
environment, we explore how to infer RIRs based on a sparse set of images and
echoes observed in the space. Towards that goal, we introduce a
transformer-based method that uses self-attention to build a rich acoustic
context, then predicts RIRs of arbitrary query source-receiver locations
through cross-attention. Additionally, we design a novel training objective
that improves the match in the acoustic signature between the RIR predictions
and the targets. In experiments using a state-of-the-art audio-visual simulator
for 3D environments, we demonstrate that our method successfully generates
arbitrary RIRs, outperforming state-of-the-art methods and -- in a major
departure from traditional methods -- generalizing to novel environments in a
few-shot manner. Project: http://vision.cs.utexas.edu/projects/fs_rir.Comment: Accepted to NeurIPS 202
Crowd-Robot Interaction: Crowd-aware Robot Navigation with Attention-based Deep Reinforcement Learning
Mobility in an effective and socially-compliant manner is an essential yet challenging task for robots operating in crowded spaces. Recent works have shown the power of deep reinforcement learning techniques to learn socially cooperative policies. However, their cooperation ability deteriorates as the crowd grows since they typically relax the problem as a one-way Human-Robot interaction problem. In this work, we want to go beyond first-order Human-Robot interaction and more explicitly model Crowd-Robot Interaction (CRI). We propose to (i) rethink pairwise interactions with a self-attention mechanism, and (ii) jointly model Human-Robot as well as Human-Human interactions in the deep reinforcement learning framework. Our model captures the Human-Human interactions occurring in dense crowds that indirectly affects the robot's anticipation capability. Our proposed attentive pooling mechanism learns the collective importance of neighboring humans with respect to their future states. Various experiments demonstrate that our model can anticipate human dynamics and navigate in crowds with time efficiency, outperforming state-of-the-art methods
Novel-View Acoustic Synthesis
We introduce the novel-view acoustic synthesis (NVAS) task: given the sight
and sound observed at a source viewpoint, can we synthesize the sound of that
scene from an unseen target viewpoint? We propose a neural rendering approach:
Visually-Guided Acoustic Synthesis (ViGAS) network that learns to synthesize
the sound of an arbitrary point in space by analyzing the input audio-visual
cues. To benchmark this task, we collect two first-of-their-kind large-scale
multi-view audio-visual datasets, one synthetic and one real. We show that our
model successfully reasons about the spatial cues and synthesizes faithful
audio on both datasets. To our knowledge, this work represents the very first
formulation, dataset, and approach to solve the novel-view acoustic synthesis
task, which has exciting potential applications ranging from AR/VR to art and
design. Unlocked by this work, we believe that the future of novel-view
synthesis is in multi-modal learning from videos.Comment: Project page: https://vision.cs.utexas.edu/projects/nva
- …